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Abstract 

In this paper we present a simple but powerful subgraph sampling primitive that is applicable in a 
variety of computational models including dynamic graph streams (where the input graph is defined by a 
sequence of edge/hyperedge insertions and deletions) and distributed systems such as MapReduce. In the 
case of dynamic graph streams, we use this primitive to prove the following results: 

• Matching: First, there exists an 0{k‘^) space algorithm that returns an exact maximum matching 
on the assumption the cardinality is at most k. The best previous algorithm used 0(kn) space 
where n is the number of vertices in the graph and we prove our result is optimal up to logarithmic 
factors. Our algorithm has 0(1) update time. Second, there exists an 0(n^/a^) space algorithm 
that returns an a-approximation for matchings of arbitrary size. (Assadi et al. Q showed that this 
was optimal and independently and concurrently established the same upper bound.) We generalize 
both results for weighted matching. Third, there exists an 0(n^/®) space algorithm that returns 
a constant approximation in graphs with bounded arboricity. While there has been a substantial 
amount of work on approximate matching in insert-only graph streams, these are the first non-trivial 
results in the dynamic setting. 

• Vertex Cover and Hitting Set: There exists an 0{k‘^) space algorithm that solves the minimum 
hitting set problem where d is the cardinality of the input sets and k is an upper bound on the size 
of the minimum hitting set. We prove this is optimal up to logarithmic factors. Our algorithm has 
0(1) update time. The case d = 2 corresponds to minimum vertex cover. 

Finally, we consider a larger family of parameterized problems (including 6-matching, disjoint paths, 
vertex coloring among others) for which our subgraph sampling primitive yields fast, small-space dynamic 
graph stream algorithms. We then show lower bounds for natural problems outside this family. 
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1 Introduction 


Over the last decade, a growing body of work has considered solving graph problems in the data stream model. 
Most of the early work considered the insert-only variant of the model where the stream consists of edges 
being added to the graph and the goal is to compute properties of the graph using limited memory. Recently, 
however, there has been a significant amount of interest in being able to process dynamic graph streams where 
edges are both added and deleted from the graph |[^[6]-[8l T0]2^25]^3T|T7i . These algorithms are all based 
on the surprising efficacy of using random linear projections, aka linear skefching, for solving combinatorial 
problems. Results include testing edge connectivity Q and vertex connectivity p5| , constructing spectral 
sparsifiers | |30| , approximating the densest subgraph | fT0| , correlation clustering Q, and estimating the 
number of triangles | [T7| . For a recent survey of the area, see | |^ . 

The concept of parameterized stream algorithms was explored by Chitnis et al. and Fafianie 
and Kratsch | [T9| . Their work investigated a natural connection between data streams and parameterized 
complexity. In parameterized complexity, the time cost of a problem is analyzed in terms of not only the 
input size but also other parameters of the input. For example, while the classic vertex cover problem is NP 
complete, it can be solved via a simple branching algorithm in time 2^ • poly(n) where k is the size of the 
optimal vertex cover. An important concept in parameterized complexity is kernelization in which the goal is 
to efficiently transform an instance of a problem into a smaller instance such that the smaller instance is a 
“yes” instance (e.g., has a solution of at least a certain size) iff the original instance was also a “yes” instance. 
For more background on parameterized complexity and kernelization, see ElllTJ- Parameterizing the space 
complexity of a problem in terms of the size of the output is a particularly appealing notion in the context of 
data stream computation. In particular, the space used by any algorithm that returns an actual solution (as 
opposed to an estimate of the size of the solution) is necessarily at least the size of the solution. 


Our Results and Related Work. In this paper we present a simple but powerful subgraph sampling primitive 
that is applicable in a variety of computational models including dynamic graph streams (where the input 
graph is defined by a sequence of edge/hyperedge insertions and deletions) and distributed systems such as 
MapReduce. This primitive will be useful for both parameterized problems whose output has bounded size 
and for solving problems where the optimal solution need not be bounded. In the case where the output has 
bounded size, our results can be thought of as kernelization via sampling, i.e., we sample a relatively small 
set of edges according to a simple (but not uniform) sampling procedure and can show that the resulting 
graph has a solution of size at most k iff the original graph has an optimal solution of size at most k. We 
present the subgraph sampling primitive and implementation details in Section]^ 


Graph Matchings. Finding a large matching is the most well-studied graph problem in the data stream 
model |[4l[5| T3jT^20|2^28]2^|M|35|3^ 44|. However, all of the existing single-pass stream algorithms 
are restricted to the insert-only case, i.e., edges may be inserted but will never be deleted. This restriction is 
significant: for example, the simple greedy algorithm using 0{n) space returns a 2-approximation if there 
are no deletions. In contrast, prior to this paper no o(n)-approximation was known in the dynamic case when 
there are both insertions and deletions. Finding an algorithm for the dynamic case of this fundamental graph 
problem was posed as an open problem in the Bertinoro Data Streams Open Problem List ||T|. 

In Section we prove the following results for computing matching in the dynamic model. Our first 
result is an 0(A:^) space algorithm that returns a maximum matching on the assumption that its cardinality is 


at most k. Our algorithm has 0(1) update time. The best previous algorithm ]l 1) was the folklore algorithm 
that collects max(deg(ri), 2A;) edges incident to each vertex u and hnds the optimal matching amongst 
these edges. This algorithm can be implemented in 0{kn) space where n is the number of vertices in the 
graph. Indeed obtaining an algorithm with f{k) space, for any function /, in the dynamic graph stream 
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case remained as an important open problem pT| . We can also extend our approach to maximum weighted 
matching. Our second result is an 0{v?/a^) space algorithm that returns an a-approximation for matchings 
of arbitrary size. For example, this implies an approximation using 0(n) space, commonly known as 
the semi-streaming space restriction 120 401. Our third result is an 0{n^/^) space algorithm that returns a 
constant approximation in graphs with bounded arboricity (such as planar graphs). This result builds upon an 
approach taken by Esfandiari et al. 1181 for the problem on insert-only graph streams. 

Vertex Cover and Hitting Set. We next consider the problem of finding the minimum vertex cover and its 
generalization, minimum hitting set. The hitting set problem can be defined in terms of hypergraphs: given a 
set of hyperedges, select the minimum set of vertices such that every hyperedge contains at least one of the 
selected vertices. If all hyperedges have cardinality two, this is the vertex cover problem. 

There is a growing body of work analyzing hypergraphs in the data stream model 41 -431. 

For example, Emek and Rosen 1151 studied the following set-cover problem which is closely related to the 
hitting set problem: given a stream of hyperedges (without deletions), find the minimum subset of these 
hyperedges such that every vertex is included in at least one of the hyperedges. They present an 0{y/n) 
approximation streaming algorithm using 0{n) space along with results for covering all but a small fraction 
of the vertices. Another related problem is independent set since the minimum vertex cover is the complement 
of the maximum independent set. Halldorsson et al. 1261 presented streaming algorithms for finding large 
independent sets but these do not imply a result for vertex cover in either the insert-only or dynamic setting. 

In Section]^ we present a 0{k'^) space algorithm that finds the minimum hitting set where d is the 
cardinality of the input sets and k is an upper bound on the cardinality of the minimum hitting set. We prove 
the space use is optimal and matches the space used by previous algorithms in the insert-only model iniiig- 
Our algorithms can be implemented with 0(1) update time. The only previous results in the dynamic model 
were by Chitnis et al. jTT) and included a 0{kn) space algorithm and a 0{k‘^) space algorithm under a much 
stronger “promise” that the vertex cover of the graph defined by any prefix of the stream may never exceed 
k. Relaxing this promise remained as the main open problem of Chitnis et al. | [TT| . In Section]^ we also 
generalize our exact matching result to hypergraphs. In Section]^ we show our result is also optimal. 

General Family of Results. Einally, we consider a larger family of parameterized problems for which 
our subgraph sampling primitive yields fast, small-space dynamic graph stream algorithms. This result is 
presented in Section]^ while lower bounds for various problems outside this family are proved in Section]^ 


1.0.1 Recent Work on Approximate Matching 

Two other groups have independently and concurrently made progress on the problem of designing algorithms 
that approximate the size of the maximum matching in the dynamic graph stream model |[9l|3^. These 
are just relevant to our second result on matching (Section [3.2| ). Specifically, Assadi et al. Q showed that 
it was possible to a-approximate the maximum matching using 0{'n?/a^) space; this matches our result. 
Eurthermore, they also showed that this was near-optimal. Konrad |[3^ proved slightly weaker bounds. 


2 Basic Subgraph Sampling Technique 

Basic Approach and Intuition. The inspiration for our subgraph sampling primitive is the following simple 
procedure for edge sampling. Given a graph G = (E, E) and probability p G [0,1], let pG,p be the distribution 
FJ U {±} defined by the following process: 

1. Sample each vertex independently with probability p and let V denote the set of sampled vertices. 
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2. Return an edge chosen uniformly at random from the edges in the induced graph on V'. If no such 
edge exists, return _L. 

The distribution ^g,p has some surprisingly useful properties. For example, suppose that the optimal 
matching in a graph G has size at most k. It is possible to show that this matching has the same size as the 
optimal matching in the graph formed by taking O(fc^) independent samples from H is not hard to 

show that such a result would not hold if the edges were sampled uniformly at randomj^ The intuition is that 
when we sample from fj,G,p we are less likely to sample an edge incident to a high degree vertex then if we 
sampled uniformly at random from the edge set. For a large family of problems including matching, it will be 
advantageous to avoid bias towards edges whose endpoints have high degree. 

Our subgraph sampling primitive essentially parallelizes the process of sampling from ^g,p- This will 
lead to more efficient algorithms in the dynamic graph stream model. The basic idea is rather than select a 
subset of vertices V', we randomly partition V into Fi U 1^2 U ... U Vi/p. Selecting a random edge from the 
induced graph on any Vi results in an edge distributed as in fjLG,p- Sampling an edge on each Vi results in 
1 /p samples from pG,p although note that the samples are no longer independent. This lack of independence 
will not be an issue and will sometimes be to our advantage. In many applications it will make sense to 
parallelize the sampling further and select a random edge between each pair, Vi and Vj, of vertex subsets. For 
applications involving hypergraphs we select random edges between larger subsets of {I/i, V 2 ,..., 

Sampling Data Structure. We now present the subgraph sampling primitive formally. Given an unweighted 
graph G = (V, E), consider a “coloring” defined by a function c:V^[h].\i will be convenient to introduce 
the notation: 

Vc = {v ^V c{v) = c} 

and we will say that every vertex in Vc has color c. For 5 C [6] , we say an edge or hyper-edge e of G is 
S-colored if c(e) = S where c(e) = {c(u) ; u G e} is the set of colors used to color vertices in e. Given this 
coloring and a constant d>l, let G' = {V, E') be a random subgraph where 

= UsC[b]:|S|<d-®5 

and Es contains a single edge chosen uniformly from the set of 5-colored edges (or Es = 0 if there are none). 
In the case of a weighted graph, for each distinct weight w. Eg contains a single edge chosen uniformly from 
the set of 5-colored edges with weight w. 

Definition 1. We define Samplej,^^ to be the distribution over subgraphs generated as above where c 
is chosen uniformly at random from a family of pairwise independent hash functions. Sample^ is the 
distribution over graphs formed by taking the union ofr independent graphs sampled from Sample^ ^ 

Motivating Application. As a first application to motivate the subgraph sampling primitive we again consider 
the problem of estimating matchings. We will use the following simple lemma that will also be useful in 
subsequent sections (the proof, along with other omitted proofs, can be found in Appendix [A|). 

Lemma 2. Let U V V be an arbitrary subset of\U\ = r vertices and let c : V [4re“^] be a pairwise 
independent hash function. Then with probability at least 3/4, at least (1 — e)r of the vertices in U are hashed 
to distinct values. Setting e < 1/r ensures all vertices are hashed to distinct values with this probability. 

*To see this, consider a layered graph on vertices Li U 1/2 U L 3 U L 4 with edges forming a complete bipartite graph on Li x L 2 , 
a complete bipartite matching on L 2 x La, and a perfect matching on Ls x 1 / 4 . If |I/i| = k and |I/ 2 | = |I/ 3 | = |I/ 4 | = k/2 
then the maximum matching has size k and every matching includes all edges in the perfect matching on L 3 x L 4 . Since there are 
f 2 (nfc) edges in this graph we would need i}{nk) edges sampled uniformly before we find the matching on L 3 x L 4 . 
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Suppose G is a graph with a matching M = {ei,..., e^} of size k. Let G' ~ Sample^ 2 i- the above 
lemma, there exists h = 0{k‘^), such that all the 2k endpoints of edges in M are colored differently with 
constant probability. Suppose the endpoints of edge received the colors ai and bi. Then G' contains an 
edge in E^a,i,bi} for o^oh i G [A:]. Assuming all endpoints receive different colors, no edge in shares 

an endpoint with an edge in for j 7 ^ i- Hence, we can conclude that G' also has a matching of size k. 

In Section]^ we show that a similar approach can be generalized to a range of problems. Using a similar 
argument there exists b = 0{k) such that G' contains a constant approximation to the optimum matching. 
However, in Section]^ we show that there exists b = 0{k) such that with high probability graphs sampled 
from Sample^ 2 ,o(iogfc) preserve the size of the optimal matching exactly. 

2.1 Application to Dynamic Data Streams and MapReduce 

We now describe how the subgraph sampling primitive can be implemented in various computational models. 

Dynamic Graph Streams. Let S' be a stream of insertions and deletions of edges of an underlying graph 
G{V, E). We assume that vertex set V = {1,2,..., n}. We assume that the length of stream is polynomially 
related to n and hence 0(log |S'|) = O(logn). We denote an undirected edge in E with two endpoints 
u,v G V hy uv. For weighted graphs, we assume that the weight of an edge is specified when the edge is 
inserted and deleted and that the weight never changes. The following theorem establishes that the sampling 
primitive can be efficiently implemented in dynamic graph streams. 

Theorem 3. Suppose G is a graph with wq distinct weights. It is possible to sample from Sample^, ^ ^ with 
probability at least 1 — S in the dynamic graph stream model using 0{b'^rwo) space and 0{r) update time. 

MapReduce and Distributed Models. The sampling distribution is naturally parallel, making it straightfor¬ 
ward to implement in a variety of popular models. In MapReduce, the r hash functions can be shared state 
among all machines, allowing Map function to output each edge keyed by its color under each hash function. 
Then, these can be sampled from on the Reduce side to generate the graph G'. Optimizations can do some 
data reduction on the Map side, so that only one edge per color class is emitted, reducing the communication 
cost. A similar outline holds for other parallel graph models such as Pregel. 

3 Matchings and Vertex Cover 

In this section, we present results on finding fhe maximum mafching and minimum verfex cover of a graph G. 
We use match(G) fo denofe fhe size of fhe maximum (weighfed or unweighfed as appropriafe) mafching in 
G and use vc(G) fo denofe fhe size of minimum verfex cover. 

3.1 Finding Small Matchings and Vertex Covers Exactly 

The main fheorem we prove in fhis secfion is: 

Theorem 4 (Finding Exacf Solutions). Suppose match(G) < k. Then, with probability 1 — 1/ poly(A:), 

match(G') = match(G) and vc(G') = vc(G) , 
where G' = {V,E') ~ Sampleioofc,2,o(iogfc)- 
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Intuition and Preliminaries. To argue that G' has a matching of the optimal size, it suffices to show that for 
every edge uv £ G that is not in G', there are a large number of edges incident to one or both of u and v that 
is in G'. If this is the case, then it will still be possible to match at least one of these vertices in G'. 

To make this precise, let U be the subset of vertices with degree at least lOfc. Let F be the set of edges in 
the induced subgraph onV \ U, i.e., the set of edges whose endpoints both have small degree. We will prove 
that with high probability, 

(F C E') and (Vtt G U , degc/(rt) > 5k) , (1) 

where E' is the set of edges in G' . Note that any sampled graph G' that satisfies fhis equafion has fhe property 
thaf for all edges uv £ G thaf are nof in G' we have deg(^/(u) > 5k or deg(^/(u) > 5k. 

Analysis. The firsl lemma establishes that it is sufficient to prove that ([T]l holds with high probability. 

Lemma 5. //'match(G) < k then Q implies match(G') = match(G) and vc{G') = vc(G). 

The next lemma establishes that o holds with the required probability. 

Lemma 6. Eq. with probability at least 1 — 1/ poly(Zc). 

Proof. First note that match(G') < k implies that there exists a vertex cover W of size of most 2k because 
the endpoints of the edges in a maximum matching form a vertex cover. Next consider H ~ Sample^oofc 2 i- 
We will show that for any e £ F and u £ U, 

F[e £ H] > 1/2 and P [degji^(tt) > 5Zc] > 1/2 . 

It follows that if r = 0(log k) and G' ~ Sample^gofc 2 r then 

F\e £ G' and degc'/(tt) > 5Zc] >1 — 1/ poly(Zc) . 

We then take the union bound over the 0{k‘^) edges in F and the 0{k) vertices in U. The fact that 
|F| = 0{k‘^) and |f7| = 0{k) follows from the promises match(G) < k and vc(G) < 2k. In particular, the 
induced graph on 1/ \ 17 has a matching of size Q{\F\/k) since the maximum degree is 0{k) and this is at 
most k. Since all vertices in U must be in the minimum vertex cover, |Z7| < 2k. 

To prove P [e G iT] > 1/2. Let the endpoints of e be x and y. Consider the pairwise hash function 
c : [n] —)> [ 6 ] that defined H where b = lOOZc. If c{x) 7 ^ c{y) and c{x),c{y) 0 A where 

A = {c(t(;) : w £ {W U r(x) U T{y)),w 0 {x, ?/}} where r(-) denotes the set of neighbors of a vertex , 

then xy is the unique edge in F|c(a;)^c(y)} is therefore in H. This follows because any edge in F|(,(a;),c(y)} 
must be incident to a vertex in W since VF is a vertex cover. However, the only vertices in V/fx) or V/fy) 
are in W are one or both of x and y and aside from the edge xy none of the incident edges on either x ox y 
are in F|c( 3 ;)^c(i;)}- Since b = lOOZc and | A| < 2A; + lOA; + lOfc = 22k, 

P [xy G iT] > 1 — P [c(x) = c{y)] — P [c(x) G A] — P [c{y) G A] > 1 — 1/6 — 2|A|/6 > 1/2 . 

To prove P [degj:£(it) > 5fc] > 1/2. Let be an arbitrary set of lOZc neighbors of u. If c{u) 0 {c(r(;) : 
w £ VF \ {rt}} and there exist different colors ci,..., c^k such that each Ci £ {c{v) : V £ Nu} \ {c(m) ; 
w £ W} then these color pairings are unique to their edges, and the algorithm returns at least 5k edges 
incident to u. This follows since every edge has at least one vertex in W. 

First note that P [c{u) £ {c{w) : m G IF \ {u}}] < 2k/b. By appealing to Lemmaj^ with probability 
at least 3/4, there are at least 6k colors used to color the vertices N^. Of these colors, at least 5k are 
colored differently from vertices in VF. Hence we find 5k edges incident to u with probability at least 
3/4-2Zc/6> 1/2. □ 
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Extension to Weighted Matching. We now extend the result of the previous section to the weighted case. 
The following lemma shows that it is possible to remove an edge uv from a graph without changing the 
weight of the maximum weighted matching, if u and v satisfy certain properties. 

Lemma 7. Let G = {V, E) be a weighted graph and let G' = {V,E') be a subgraph with the property: 

\/uv e E\E' , deggl“^^(rt) > 5k or > 5k , 

where degg(u) is the number of edges incident to u in G with weight w. Then, match(G) = match(G'). 

Consider a weighted graph G and let G' ~ Sampleioofc,2,o(iogfc)- For oach weight w, let G^ and G'^ 
denote the subgraphs consisting of edges with weight exactly w. By applying the analysis of the previous 
section to G^, and G'^ we may conclude that G' satisfies the properties of the above lemma. Hence, 
match(G) = match(G'). To reduce the dependence on the number of distinct weights in Theorem|^ we 
may first round each weight to the nearest power of (1 + e) at the cost of incurring a (1 + e) factor error. If 
W is the ratio of the max weight to min weight, there are 0(e“^ log W) distinct weights after the rounding. 


3.2 Finding Large Matchings Approximately 

We next show our graph sampling primitive yields an approximation algorithm for estimating large matchings. 

Intuition and Preliminaries. Given a hash function c : C —)• [ 6 ], we say an edge uv is colored i if 
c{u) = c{v) = i. If the endpoints have different colors, we say the edge is uncolored. The basic idea behind 
our algorithm is to repeatedly sample a set of colored edges with distinct colors. Note that a set of edges 
colored with different colors is a matching. We use the edges in this matching to augment the matching 
already constructed from previous rounds. In this section we require the hash functions to be 0{k)-'wise 
independent and, in the context of dynamic data streams, this will increase the update time by a 0{k) factor. 

Theorem 8. Suppose match(G) > k. For any 1 < a < s/k and 0 < e < 1, with probability 1 — 1/ poly(A;), 


match(G') > 


1 - e 
2a 


■ k , 


where G'^ Samp\e 2 k/a,i,r ^kere r = 0{ka ^logfe). 

Proof. Let Hi,... ,Hr ~ Sample 2 fc/Q i_i and let G' be the union of these graphs. Consider the greedy 
matching Mr where Mq = 0 and for t > 1, Mt is the union of Mt-i and additional edges from Ht. We will 
show that if Mt-i is small, then we can find many edges in Ht that can be used to augment Mt-i. 

Consider Ht and suppose Let c : L —)• [5] be the hash-function used to define Ht 

where b = ^. Let U be the set of colors that are not used to color the endpoints of Mt-i, i.e., 

t/ = {c G [6] : there does not exist a matched vertex u in Mt-i with c{u) = c} . 

and note that \U\ > b — 2\Mt-i\ > For each c ^ U, define the indicator variable Xc where Ac = 1 if 
there exists an edge uv with c(n) = c{v) = c. We will find X = Ylc&u edges to add to the matching. 

Since match(G) > k, there exists a set A; — 2\Mt-i\ > ke vertex disjoint edges that can be added to 
Mt-i. Let p = jj: and observe that 

E [Ac] > kep^ - 2 ^^/ > kep^/2 = e ' ^ 
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Therefore, E [X] > (^) • e • ^ Since Xc and X^i are negative correlated, P [X > S[X]/2] > 

1 — exp (— (ea)) > n(e). Hence, with each repetition we may increase the size of the matching hy at least 
ea/2 with prohahility H(e). After 0(ka~^e~^ log k) repetitions the matching has size at least • k. □ 

By applying Theoremj^for all k G {1, 2, 4, 8 ,16,...} and appealing to Theorem]^ we establish: 

Corollary 9. There exists a 0{nY)o\y\ogn)-space algorithm that returns an 0{n^^^)-approxiniation to the 
size of the maximum matching in the dynamic graph stream model. 

This result generalizes to the weighted case using the Crouch-Stuhhs technique m They showed that if 
we can find a ^^-approximation to the maximum cardinality matching amongst all edges of weight greater 
than (1 + e)* for each i, then we can find a 2(1 + e)/3-approximation to the maximum weighted matching in 
the original graph. 

3.3 Matchings in Planar and Bounded-Arboricity Graphs 

We also provide an algorithm for estimating the size of the matching in a graph of hounded arhoricity. Recall 
that a graph has arhoricity v if its edges can he partitioned into at most v forests. Our result is as follows. 

Theorem 10. There exists a 0{i'e log 5 ^)-space dynamic graph stream algorithm that returns a 

+ 9)(1 + e)^ approximation o/match(G') with probability at least 1 — J where v is the arhoricity of G. 

The basic idea is to generalize the approach taken by Esfandiari et al. | [T8| in the insert-only case. This 
can be achieved using sparse recovery sketches and our algorithm for small matchings. See Appendix]^ 

4 Hitting Set and Hypergraph Matching 

In this section we present exact results for hitting set and hypergraph matching. Throughout the section, let G 
be a hypergraph where each edge has size exactly d and hs(G') < k. In the case where d = 2, the problems 
under consideration are vertex cover and matching. Throughout this section we assume d is a constant. 

Intuition and Preliminaries. Given that the hitting set problem is a generalization of the vertex cover 
problem, it will be unsurprising that some of the ideas in this section build upon ideas from the previous 
section. However, the combinatorial structure we need to analyze for our sampling result goes beyond what is 
typically needed when extending vertex cover results to hitting set. We first need to review a basic definition 
and result about “sunflower” set systems Gil- 

Lemma 11 (Sunflower Lemma |[T7|). Let X be a collection of subsets of[n\. Then Ai ,..., ^ X is an 

s-sunflower if Ai n Aj = C for all 1 < i < j < s. We refer to G as the core of the sunflower and Ai\G as 
the petals. If each set in X has size at most d and | > dlk'^, then X contains a (k + l)-sunflower. 

Let sciG) denote the number of petals in a maximum sunflower in the graph G with core G. We say a 
core is large if sg{G) > ak for some large constant a and significant if sg{C) > k. Define the sets: 

• U = {G C V I sg(C) > ak} is the set of large cores. 

• F = (D € E I MG € U,G % D} is the set of edges that do include a large core. 

• U' = {G G U I MG' C C, sg{G') < k} is the set of large cores that do not contain significant cores. 
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The sets U and F play a similar role to the sets of the same name in the previous section. For example, 
if d = 2, then a large core corresponds to a high degree vertex. However, the set U' had no corresponding 
notion when d = 2 because a high degree vertex cannot contain another high degree vertex. The following 
hounds on |F| and \ U'\ are proved in the Appendix [ a| 

Lemma 12. |F| = 0{k^) and \U'\ = 

The next lemma shows that if a core C is contained in a set D, then the set of other edges D' that intersect 
D at C has a hitting set that a) does not include vertices in C and h) has small size if sg{C) is small. 

Lemma 13. For any two sets of vertices C F D, define 

Mc,d = {D'\C \ D' £ E,Dr\D' = C} . 


Then h.s{Mc,D) < SG{C)d. 

Hitting Set. For the rest of this section we let G' = {V, E') ~ Sample^, dri^) where b = 0{k), d is the 
cardinality of the largest hyperedge, and r = 0(log k). Let W he. a minimum hitting set of G. 

Theorem 14. Suppose hs(G) < k. With probability 1 — 1/ poly(/c), hs(G') = hs(G). 

Proof. For each significant core G there has to he at least one vertex from the hitting set in G. Since all large 
cores are significant, hs(G) = hs(f7 U F). \fG £U has a subset C' F C such that sg{G') > k, then there 
is at least one vertex from the hitting set in G' and it also hits G. Thus, we only need to find significant cores 
that do not contain other significant cores. Such sunflowers wifh more fhan ak pefals will be found according 
to Lemma [T5| Sunflowers wifh af most ak petals will be found as a part of set F according to Lemma[T^ □ 

Lemma 15. P [scKC") > k for all G £ U']>1 — 1/ poly(A:). 

Proof Fix an arbitrary core G £ U'. Consider H ~ Sample^ ^ i and let c : [n] —)■ [ 6 ] be the coloring that 
defined H. We need fo identify sets 81 , 82 , ■■ ■ 8 k+i F [b] each of size d with the following three properties: 

1. All edges that are S'j-colored contain C 

2. There is at least one S'j-colored edge. 

3. If D is Sj-colored and D' is 5j-colored then {D\G) F {D' \ C) = 0. 

Let Di, D 2 , ■ ■., Dk+i be any set of edges where Di is Sj-colored. Then these sets form a sunflower of size 
A: + 1 on core G. If will suffice fo show thaf there exists such a family 81 , 82 , ■■ ■ S'fc+i C [b] with probability 
at least 1/2 because repeating the process 0(log A:) times will ensure that such a family exists with high 
probability. The result then follows by taking the union bound over all G £ U' since |f7'| = 0{k'^~^). 

Property 1. We say 8 £ [b] is good if all S'-colored edges contain G. We first define a set of vertices A such 
that all edges disjoint from A include G. Then any 8 such that 8 n c{A) = 0 will be good since if c{D) = 8 
for some edge D then 8 n c{A) = 0 => c{D) n c{A) = 0=^Z)nA = 0, and so G F D. Let 

A = {W\G)F (^Uc'cc hs(Mc^c)) • 

where W is a minimum hitting set and, by a slight abuse of notation, we use hs(Mc/^c') to denote a minimum 
hitting set of Mg\c- Note that hs(Mc/( 7 ) does not include any vertices in G. Since VF is a hitting set, all 


edges that do not intersect W \ C must intersect with C. But all edges that intersect with only a subset of C, 
say C', must intersect with hs(Mc/^c'). Hence A has the claimed property. 

Properties 2 and 3. Next, let "P be a set of petals in a sunflower with core C that do not intersect with A. 
We may chose a set of \V\ = ak — | A| such petals. We will show later that | = 0{k) so we may assume 

\V\ = ak — 1^1 > 2{k + 1) for a sufficiently large constant a. For each P £ V, define the set: 



Let V' contain all P £ V such that c{P) n c{Ap) = 0 and |c(P)| = |P|. Suppose c(C) n c{A) = 0 and 
|c(C')| = |C|. Then the family P = {c{P U C')}pg-p/ satisfies Property 2. 

To show P also satisfies Property 3 consider edges CuQi and CUQ 2 such that c(C U Qi) = c{C U Pi) 
and c{C U Q 2 ) = c{C U P 2 ). Then c{Qi) = c{Pi) and c{Q 2 ) = c{P 2 ) because |c(C')| = \C\ and 
|c(Pi)| = |Pi|, and |c(P 2 )| = \P 2 \- But c(Pi) n 0 (^ 2 ) = 0 implies c{Qi) n c{Q 2 ) = 0 and so Qi n Q 2 = 0 - 
Size of family P. It will suffice to show that c{C) H c{A) = 0 and |c(C')| = \C\ with probability 3/4 and 
\V'\ > {k + 1) with probability 3/4. Then P satisfies all three properties and has size {k +1) with probability 
1/2. Suppose b = max(4(i(|74| + d), 8 d{\Ap\ + d)). Then, 


P [c{C) n c{A) = 0, |c(C)| = |C|] > 1 - {d\A\ + d^)/b > 3/4 . 


For each P £ V, let Xp = 1 if c{P) H c{Ap) 7 ^ 0 or |c(P)| 7 ^ |P| and Xp = 0 otherwise. Then 
IE [J2Xp] < |P|(d|^p| + d‘^)/b < \V\/8. By applying the Markov inequality, ¥['£Xp> \V\/2] < 1/4. 
Hence, \P'\ = \P\ — ^ \'P\/‘^ = ^.{k + 1) with probability at least 3/4. 

It remains to show that b = 0{k) where we omit dependencies on d. To do this, it suffices to show 
1^1 = 0{k) and \ Ap\ = 0{k). By appealing to Lemma [T^ 

|4l| < \Ap\ < 1^1 + \C\ + d\V\ < |IL| + hs{Mc,c') + IC’I + d\V\ <k + 2 ‘^dk + d + dak = Oik) . 


Lemma 16. P [P C E'] >1 — 1/ poly(fc). 

Proof. Pick an arbitrary edge D £ F. Consider H ~ Sample^ ^ i and let c : [re] —)> [b] be the coloring that 
defined H. We need to show that there is a unique edge that is c(L>)-colored since then D is an edge in H. 
It suffices to show that this is the case with probability at least 1/2 because repeating the process 0(log k) 
times wifi ensure that such a family exists with high probability. The result then follows by taking the union 
bound over all P G P since jPj = 

Let S = c{D). We first define a set A of vertices such that the only edge that is disjoint from A is D. 
Then it follows that D is the unique S-colored edge if S' n c(A) = 0; every other edge intersects with A and 
hence must share a color with A. We define A as follows: 


2l = (IL\P)u(UccDhs(Mc,D)) 


where IF is a minimum hitting set and, by a slight abuse of notation, we use hs(Mcp)) to denote a 
minimum hitting set of Mq^d- Note that hs(Mc p) does not include any vertices in D. If an edge is disjoint 
from (IF \ D) then it must intersect D since IF is a hitting set. Suppose there exists an edge such that 
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D n D' = C C D then D' intersects hs(Mc'o). Hence, the only edge that is disjoint from A includes the 
vertices in D and hence is equal to D on the assumption that all edges have the same number of vertices. 

It remains to show that S n c{A) = 0 with prohahility at least 1/2. If 5 > 2 d\A\ then we have 


P [5 n c{A) = 0] > 1 - d\A\/b > 1/2 . 

Finally, note that 2d\A\ = 0{k) since |A| < \W\ + YIccd hs(Mc,D) < k + 2'^akd = 0{k) hy appealing 
to Lemma 13 and using the fact that sg{C) < ak for all (7 C H since D £ F. □ 

A result for hypergraph matching follows along similar lines. 

Theorem 17. Suppose match(G) < k' = k/d. With probability 1 — l/poly(A;), match(G') = match(G). 


5 Sampling Kernels for Subgraph Search Problems 

Finally, we consider a class of problems where the objective is to search for a subgraph H of G{V, E) which 
satisfies some property V. In the parametrized setting, we typically search for the largest H which satisfies 
this property, subject to the promise that the size of any H satisfying V is at most k. For concreteness, we 
assume the size is captured by the number of vertices in H, and our objective is to find a maximum cardinality 
satisfying subgraph. The sampling primitive Sample^ 2 1 t)e used here when V is preserved under vertex 
contraction: if G' is a vertex contraction of G, then any subgraph H of G' satisfying V also satisfies V for 
G (with vertices suitably remapped). Here, the vertex contraction of vertices u and v creates a new vertex 
whose neighbors are r(u) U r(u). Many well-studied problems posess the required structure, including: 

— 6-matching, to find a (maximum cardinality) subgraph H of G such that the degree of each vertex in H is 
at most 6. Hence, the standard notion of matching in Section |^is equivalent to 1-matching. 

— /c-colorable subgraph, to find a subgraph H that is /c-colorable. The maximum cardinality 2-colorable 
subgraph forms a max-cut, and more generally the maximum cardinality /c-colorable subgraph is a max k-cut 

— other maximum subgraph problems, such as to find the largest subgraph that is a forest, has at least c 
connected components, or is a collection of vertex disjoint paths. 

Theorem 18. Let V be a graph property preserved under vertex contraction. Suppose that the number 
of vertices in some optimum solution opt(G) is at most k. Let G' ~ Sample4^2 2,i(G'). With constant 
probability, we can compute a solution H for V from G' that achieves |Ff | = | opt(G) |. 

Proof We construct a contracted graph G" from G' based on the color classes used in the Sample operator: 
we contract all vertices that are assigned the same color by the hash function c(). Fix an optimum solution 
opt(G) with at most k vertices. Lemmaj^shows that for 6 = Akf, all vertices involved in opt{G) are hashed 
into distinct color values. Hence, the subgraph opt(G) is a subgraph of G''\ for any edge e = (tt, v) £ opt(G), 
the edge itself was sampled from the data structure, or else a different edge with the same color values was 
sampled, and so can be used interchangeably in G”. Hence, (the remapped form of) opt(G) persists in G". 
By the vertex contraction property of V, this means that a maximum cardinality solution for V in G” is a 
maximum cardinality solution in G. 

Note that for this application of the subgraph sampling primitive, it suffices to implement the sampling 
data structure with a counter for each pair of colors: any non-zero count corresponds to an edge in G". □ 
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We can follow the same template laid out in Section 3.1 to generalize to the weighted case (e.g., where 
the ohjective is to find the subgraph satisfying V with the greatest total weight). We can perform the sampling 
in parallel for each distinct weight value, and then round each edge weight to the closest power of (1 + e) to 
reduce the number of weight classes to 0{e~^ log W), with a loss factor of (1 + e). 
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A Omitted Proofs 


Proof of Lemma^ Let b = 4er. For a vertex u £ U, let lu be the indicator random variable that equals one 
if there exists u' £ U \ {n} such that c{u) = c{u'). Since c is pairwise independent, 


IP ^ P [c(rt) = c(ri')] = ^ l/6<r/5 = e/4. 

u'£U\{u} u'£U\{u} 


Let I = Z 


u^u Then Markov’s inequality implies P [/ > er] < 1/4. 


□ 


Proof of Theorem^ To sample a graph from Sample^ dr simply sample r graphs from Sample^ d r in 
parallel. To draw a sample from Sample^ ^ we employ one instance of an £o-sampling primitive for each 
of the 0(6'^) edge colorings 112 271: Given a dynamic graph stream, the £o-sampler returns FAIL with 


probability at most S. Otherwise, it returns an edge chosen uniformly at random amongst the edges that have 
been inserted and not deleted. If there are no such edges, the £o-sampler returns NULL. The £o-sampling 
primitive can be implemented using 0(log^ n log <5“^) bits of space and 0(polylog n) update time. In some 
cases, we can make use of simpler deterministic data structures. For Theorem we can replace the io 
sampler with a counter and the exclusive-or of all the edge identifiers, since we only require to recover edges 
when they are unique within their color class. For Theorem[T^ we only require a counter. In both cases, the 
space cost is reduced to 0(log n). 

At the start of the stream we choose a pairwise independent hash function c : U —)■ [6]. For each weight 
w and subset S C [b] of size d, this hash function defines a sub-sfream corresponding fo fhe 5-colored edges 
of weight w. We then use £o-sampling on each sub-stream to select a random edge from Es- □ 


Proof of Lemma^ We first argue that vc(G') = vc(G). Since the vertex cover of G is of size at most 2k, 
we know every vertex in U must be in the vertex cover of both G and G' since the degrees of such vertices in 
both graphs are strictly greater than 2k. This follows because if a vertex in U was not in the minimum vertex 
cover then all its neighbors need to be in the vertex eover. 

We next argue that match(G') = match(G). If property ([T]l is satisfied then G' contains a matching 
of size match(F) -I- |f7| > match(G) since we may choose the optimum matching in F and then still be 
able to match every vertex in U. This follows because tbe optimum matching in F “consumes” at most 2k 
potential endpoints, since match(G) < k. Hence, each of the (at most 2k) vertiees in U ean still be matched 
to 2 >k possible vertices. □ 


Proof of Lemma^ Let E\E' = {ei, 62 ,... et} and let G' be the graph formed by removing {ei,..., e*} 
from G. So Gq = G and G[ = G'. For the sake of contradiction, suppose match(G) > match(G') and let r 
be the minimal value such that match(G) > match(G^). 

By the minimality of r, match(G) = match(G^_^). Consider the maximum weight matching M in 
G/_^. If Cr 0 M then match(G) = match(G^_^) = match(G/) and we have a contradiction. If Cr £ M, 
let u, V be the endpoints of and the weight of be w. Without loss of generality deg/i/ (u) > d^, (u) > 5k. 
Hence, there exists edge ux of weight w in G/ where x is not an endpoint in M. Therefore, the matching 
(M \ {cr}) U {ux} is contained in G{ and has the same weight as M. Hence, match(G) = match(G/_^) = 
match(G/) and we again have a contradiction. □ 

Proof of Corollary^ For 1 < f < log n, let G' ~ Sample^^ ^ where r = 0(2®a“^ log k) and b = 2*+^/a. 
These graphs can be generated in space. For some i, 2* < match(G) < 2*+^ and bence 

match(G') = U(match(G)/a). □ 
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Proof of Lemma [72] For the sake of contradiction assume |F| > d\{ak)'^. Then, hy the Sunflower Lemma, 
F contains a {ak + l)-sunflower. If the core of this sunflower is empty, F has a matching of size {ak + 1) 
and therefore cannot have a hitting set of size at most k. If the sunflower has a non-empty core C, then some 
edge D ^ F contains C, which contradicts the definition of F. Therefore, |F| < d\{ak)'^. 

To prove \U'\ < {d — l)\k^~^, first note that \ C'\ < d — 1 for all C' & U'. For the sake of contradiction 
assume that \ U'\ > {d — Then, hy the Sunflower Lemma again, U' contains a (A: + l)-sunflower. 

Note that it is a sunflower of cores, not hypergraph edges. Let (71,(72,..., Ck+i he the sets in the sunflower. 
Each of these sets has to contain at least one vertex of the minimum hitting set. Therefore, if (7i, (72,..., Ck+i 
are disjoint (i.e., the core of the sunflower is empty), IF has a matching of size (A: + 1) and cannot have 
a hitting set of size at most k. If the sunflower has a non-empty core C *, we will show that union of 
the maximum sunflowers with cores (7i, C 2 ,..., Ck+i contains a sunflower with k + I edges with core 
(7* C (7i G Uk This contradicts the definition of IF and therefore \U'\ < {d — 1)!A:'^“^ = 0{k^~^). To 
construct the sunflower on C*, for i = l,...,A:-|-l, we pick an edge Di in the maximum sunflower with 
core Ci such that Di n Cj = C* for j i and Di n Dj = C* for j < i. This is possible if a is sufficiently 
large. □ 

Proof of Lemma [77] Consider the size of minimum hitting set of Mc^d- If ^s{Mc,d) > SG{C)d, then 
Mc^d has a matching of size greater than sg{C). This matching together with the set C forms a sunflower 
with core C and over sg{C) petals, which contradicts the assumption. Therefore, ]is{Mg,d) < SG{C)d as 
claimed. □ 

Proof of Theorem \17\ hs(G) < dk' = k. Let M he the matching. F n M is preserved in G'. Consider an 
edge D ^ M such that C F D for some C ^U. Then in G' we can find (hy Lemma p3|) at least A; -|- 1 petals 
in a sunflower with core either (7 itself or some G' C G. At most k of those intersect M \ {L>}. Therefore, 
there is still at least one edge we can pick for the matching. □ 


B Matchings in Planar and Bounded-Arboricity Graphs 


In this section, we present an algorithm for estimating the size of the matching in a graph of hounded 
arhoricity. Recall that a graph has arboricity v if its edges can he partitioned into at most v forests. In 
particular, it can he shown that a planar graph has arhoricity at most 3. We will make repeated use of the fact 
that the average degree of every subgraph of a graph with arboricity v is at most 2 v. 


Our algorithm is based on an insertion-only streaming algorithm due to Esfandiari et al. 1181. They first 
proved upper and lower bounds on the size of the maximum matching in a graph of arboricity v. 


Lemma 19 (Esfandiari et al. 118 1). For any graph G with arboricity a, define a vertex to be heavy if its 
degree is at least 2z^ -f 3 and define an edge to be shallow if it is not incident to a heavy vertex. Then, 

maxj/i, s| , r, x 

- < match((7) < 2max|/i, s| . 

2.50 + 4.5 ~ ^ ^ ^ ’ 


where h is the number of heavy vertices and s is the number of shallow edges. 

To estimate max{Ai, s}, Esfandiari et al. sampled a set of vertices Z and (a) computed the exact degree of 
these vertices, then (b) found the set of all edges in the induced subgraph on these vertices. The fraction of 
heavy vertices in Z and shallow edges in the induced graph are then used to estimate h and s. By choosing the 
size of Z appropriately, they showed that the resulting estimate was sufficiently accurate on the assumption 
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that max{/i, s} is large. In the case where max{/i, s} is small, the maximum matching is also small and 
hence a maximal matching could he constructed in small space using a greedy algorithm. 

Algorithm for Dynamic Graph Streams. In the dynamic graph stream model, it is not possible to construct 
a maximal matching. However, we may instead use the algorithm of Theorem]^ to find the exact size of 
the maximum matching. Furthermore we can still recover the induced subgraph on sampled vertices Z via 
a sparse recovery sketch p2| . This can be done space-efficiently because the number of edges is at most 
2v\Z\. Lastly, rather than fixing the size of Z, we consider sampling each vertex independently with a fixed 
probability as this simplifies the analysis significantly. The resulting algorithm is as follows: 

1. Invoke algorithm of Theorem|^for k = and let r be the reported matching size. 

2. In parallel, sample vertices with probability p = and let Z be the set of sampled vertices. 

Compute the degrees of vertices in Z and maintain a 2i/|Z|-sparse recovery sketch of the edges in the 
induced graph on Z. Let sz be the number of shallow edges in the induced graph on Z and let sz be 
the number of heavy vertices in Z. Return max{r, hz /p, sz/p^}- 


Analysis. Our analysis relies on the following lemma that shows that max{/i^/p, s^/p^} is a 1 + e 
approximation for max{s, h} on the assumption that max{s, h} > . 

Lemma 20. P [| max{/i^/p, s^/p^} — max{s, h}\ < e • max{n^/^, s, /i}] > 4/5 . 

Proof. First we show szjp^ is a sufficiently good estimate for s. Let S be the set of shallow edges in G 
and let Ez be the set of edges in the induced graph on Z. For each shallow edge e £ S, define an indicator 
random variable X^. where = 1 iff e £ Ez and note that sz = Then, 

E [sz] = sp2 and V [sz] = ^ - E [Xe] E . 

eeSe'eS 

Note that 

9 4 -i? / 

p — p 11 e = e 

p3 _ p4 jj: g g/ exactly one endpoint . 

0 if e and e' share no endpoints 


^ E - E [Xe] E = < 

e'eS 


and since there are at most 2z^ + 3 edges that share an endpoint with a shallow edge, 

V [sz] < s(p^ — p^ + (2i^ + 3)p^ — p^) < 2sp^ 


on the assumption that {2u + 3) < 1/p. We then use Chebyshev’s inequality to obtain 


\sz — sp^l < • max{n^/®, s} 


< 


2sp^ 


(p2g . max{n^/^, s})^ 


< 9/10 . 


( 2 ) 


Next we show that hz /p is a sufficiently good estimate for h. Let H denote the set of h heavy vertices 
in G and define an indicator random variable Yy for each v £ H, where = 1 iff u G Z. Note that 
hz = YlveH ® = hp. Then, by an application of the Chernoff-Hoeffding bound. 


P \hz — hp\ > epmax{/i, < exp(—e^pTi^/^/3) < 9/10 . 

Therefore, it follows from Eq.|^and l^that P [max{/iz/p, s/p^} < e max{/i, s, >4/5. 


(3) 

□ 
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Theorem 21. There exists a 0{i>e ^)-space dynamic graph stream algorithm that returns a 

{Bu + 9)(1 + e)^ approximation o/match(G) with probability at least 1 — J where v is the arboricity of G. 


Proof. To argue the approximation factor, first suppose match(G) < 2n?/^. In this case r = match(G) and 
max{s, h} < (2.5i^ + 4.5) match(G) by appealing to Lemma 19 Hence, 


match(G) < max{r, hz/p, sz/p^} < (2.5z^ + 4.5) match(G) 


Next suppose match(G) > . In this case, max 

max{/i^/p, sz/p^} = (1 ± e) max{s, h}, and so 


{s,h} > r?l'° by Lemma 19 Therefore, by Lemma 
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match(G) 
2(1 + 6 ) 


< max{r, hz/p, sz/p^} < (1 + e) max{s, h} < (1 + e){2.hv + 4.5) match(G) 


To argue the space bound, recall that the algorithm used in Theorem [^requires space. Note 

that \Z\ < 2np = with high probability. Hence, to sample the vertices Z and maintain a 

2 z^|Z|-sparse recovery requires space. □ 


C Lower Bounds 

C.l Matching and Hitting Set Lower Bounds 

The following theorem establishes that the space-use of our matching, vertex cover, hitting set, and hyper 
matching algorithms is optimal up to logarithmic factors. 

Theorem 22. Any (randomized) parametrized streaming algorithm for the minimum d-hitting set or maximum 
(hyper)matching problem with parameter k requires n(fc‘^) space. 

Proof. We reduce from the Membership communication problem: 

Membership 

Input: Alice has a set C [n], and Bob has an element 1 < x < n. 

Question: Bob wants to check whether x € X. 

There is a lower bound of H(n) bits of communication from Alice to Bob, even allowing randomiza¬ 
tion @- 

Let S = siS 2 ...Sn be the characteristic string of X, i.e. a binary string such that = 1 iff i G X. Let 
k = -//n. Fix a canonical mapping /i ; [n] —I [k/^. This way we can view an n bit string as an adjacency 
matrix of a d-partite graph. Construct the following graph G with d vertex partitions hi, V 2 ,..., 14: 

• Each partition V) has dk vertices: for each j G [k] create vertices v/p v/p v/p..., 

• Alice inserts a hyperedge (v^ p, V 2 p,..., -J iff the corresponding bit in the string S' is 1, i.e., Sa = 1 
where h{a) = {ji, 32 , ■■■, jd)- 

• Let h{x) = (Ji, J2,..., Jd)- Bob inserts edge {v/pvlpvlp ..., v^~^) iff j / Ji. 
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Alice runs the hitting set algorithm on the edges she is inserting using space f{k). Then she sends the memory 
contents of the algorithm to Bob, who finishes running the algorithm on his edges. 

The minimum hitting set should include vertices such that j / Jj. If edge 
in the graph, we also need to include one of its vertices. Therefore, 

xex hs{G) = dk - d + 1 

On the other hand, 

x^X s^ = 0 lis{G) = dk - d 

Alice only sends /(fc) bits to Bob. Therefore, /(fc) = Q{n) = 0(A:‘^). 

For the lower bound on matching we use the same construction. For each vertex v*j such that j ^ Ji 
maximum matching should include (v^j,vjj,vf If edge is in the graph, 

we include it in the matching as well. Therefore, 

x£X <;=► Sx = l (nj[‘n 2 _j 2 ,..., j^) is in G match(G) = dfc — d + 1 

And 

x^X <;=> S 3 ; = 0 <;=> (nj[‘jj, n 2 ^j 2 ,..., j^) is not in G match(G) = dA: — d 


□ 


C.2 Lower Bounds for Problems considered by Fafianie and Kratsch 1191 


Comparison with Lower Bounds for Streaming Kernels: Fafianie and Krafsch 1191 infroduced fhe notion 
of kernelizafion in fhe sfreaming selling as follows: 


Definition 23. A 1-pass streaming kernelization algorithm is receives an input (x, k) and returns a kernel, 
with the restriction that the space usage of the algorithm is bounded by p{k) ■ log \x\for some polynomial p. 


Fafianie and Krafsch 1191 gave lower bounds for several parameterized problems. In particular, Ihey 
showed fhal: 


• Any 1-pass kernel for Edge Dominating Set(A:) requires D(m) bils, where m is fhe number of 
edges. However, Ihere is a 2-pass kernel which uses 0{k^ • logn) bils of local memory and 0{k‘^) 
lime in each step and relurns an equivalenf inslance of size 0 {k^ ■ log k). 

• The lower bound of Q{m) bils for any 1-pass kernel also holds for several olher problems such as 
Cluster Editing(/c), Cluster Deletion(A:), Cluster Vertex Deletion(A;), Cograph 
Vertex Deletion(A;), Minimum Eill-In(A;), Edge Bipartization(/c), Eeedback Vertex 
Set(A:), Odd Cycle Transversal(A:), Triangle Edge Deletion(A:), Triangle Vertex 
Deletion(A:), Triangle Packing(A:), s-Star PACKiNG(fc), Bipartite Coloreul Neighbor- 
hood(A:). 

• Any f-pass kernel for CLUSTER Editing(A;) and Minimum Pill-In(A;) requires Q{n/t) space. 
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In this section, we give Q(n) lower bounds for the space complexity of all the problems considered by 
Fafianie and Kratsch. In addition, we also consider some other problems such as Path (A:) which were not 
considered by Fafianie and Kratsch. A simple observation shows that any lower bound for parameterized 
streaming kernels also transfers for the parameterized streaming algorithms. Thus the results of Fafiane and 
Kratsch CD also give lower bounds for the parameterized streaming algorithms for these problems. However, 
our lower bounds have the following advantage over the results of p9| : 

• All our lower bounds also hold for randomized algorithms, whereas the kernel lower bounds were for 
deterministic algorithms. 

• With the exception of Edge Dominating Set(A;), all our lower bounds also hold for constant number 
of passes. 

C.2.1 Lower Bound for Edge Dominating Set 

We now show a lower bound for the Edge Dominating Set(A:) problem. 

Definition 24. Given a graph G = {V, E) we say that a set of edges X C E is an edge dominating set if 
every edge in E\X is incident on some edge of X. 

Edge Dominating Set(A:) Parameter, k 

Input: An undirected graphs G and an integer k 

Question: Does there exist an edge dominating set X C. E of size at most kl 

Theorem 25. For the Edge Dominating Set(A;) problem, any (randomized) streaming algorithm needs 
D(n) space . 

Proof. Given an instance of Membership, we create a graph G on n + 2 vertices as follows. Eor each 
i G [n] we create a vertex vi. Also add two special vertices a and b. Eor every y £ X, add the edge (a, y). 
Einally add the edge (6, x). 

Now we will show that G has an edge dominating set of size 1 iff Membership answers YES. In the 
first direction suppose that G has an edge dominating set of size 1. Then it must be the case that x £ X: 
otherwise for a minimum edge dominating set we need one extra edge to dominate the star incident on a, 
in addition to the edge (6, x) dominating itself. Hence Membership answers YES. In reverse direction, 
suppose that Membership answers YES. Then the edge (a, x) is clearly an edge dominating set of size 1. 

Therefore, any (randomized) streaming algorithm that can determine whether a graph has an edge 
dominating set of size at most k = I gives a communication protocol for Membership, and hence requires 
D(n) space. □ 

C.2.2 Lower Bound for ^-Free Deletion 

Definition 26. A set of connected graphs Q is bad if there is a minimal (under operation of taking subgraphs) 
graph H £ Q such that 2 P 2 E H, where P 2 is a path on 2 vertices. 

Eor any bad set of graphs Q, we now show a lower bound for the following general problem: 

^-Eree Deletion(A:) Parameter: k 

Input: A bad set of graphs Q, an undirected graph G = (V, E) and an integer k 
Question: Does there exist a set X C 1/ such that G \ X contains no graph from Q7 
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The reduction from the DiSJOiNTNESS problem in communication complexity. 


Disjointness 

Input'. Alice has a string x G {0,1}" given by xiX2 ■ ■ ■ Xn- Bob has a string y G {0,1}"^ given by 

2/12/2 • • • 2/n- 

Question'. Bob wants to check if 3 z G [n] such that Xi = yi = 1. 


There is a lower bound of Q{n/p) bits of communication between Alice and Bob, even allowing p-rounds 
and randomization K|. 


Theorem 27. For a bad set of graphs Q, any p-pass (randomized) streaming algorithm for the G -Free Deletion 
problem needs Fl{n/p) space . 


Proof. Given an instance of DiSJOiNTNESS, we create a graph G which consists of n disjoint copies say 
Gi, G 2 ,. ■ ■, Gn of H' '.= H \ 2 P 2 . Let the two edges removed from FI to get H' be ei and 62 - For each 
z G [n], to the copy Gi of H' we add the edge ei iff Xj = 1 and the edge 62 iff yi = 1. We now show that the 
resulting graph G contains a copy of H if and only if DiSJOiNTNESS answers YES. 

Suppose that DiSJOiNTNESS answers YES. So there is a j G [rz] such that Xj = 1 = yj. Therefore, to the 
copy Gj of H' we would have added the edges ei and 62 which would complete it into H. So G contains 
a copy of H. In other direction, suppose that G contains a copy of H. Note that since we add n disjoint 
copies of H' and add at most two edges (ei and € 2 ) to each copy, it follows that each connected component 
of G is in fact a subgraph of Ff = FF' U (ei + 62 ). Since FF is connected and G contains a copy of FF, some 
connected component of G must exactly be the graph FF, i.e, to some copy Gi of FF' we must have added 
both the edges ei and €2- This implies Xi = 1 = yi, and so DiSJOiNTNESS answers YES. 

Since each connected component of G is a subgraph of FF, the minimality of FF implies that G contains a 
graph from G iff G contains a copy of FF, which in turn is true iff DiSJOiNTNESS answers YES. Therefore, 
any p-pass (randomized) streaming algorithm that can determine whether a graph is ^-free (i.e., answers 
the question with A: = 0) gives a communication protocol for DiSJOiNTNESS, and hence requires Fl{n/p) 
space. □ 


This implies lower bounds for the following set of problems: 

Theorem 28. For each of the following problems, any p-pass (randomized) algorithm requires Fl(n/p) space: 
Eeedback Vertex SEifk), Odd Cycle TRANSVERSAL(/i;), Even Cycle Transversal(/c) and 
Triangle Deletion(/c). 

Proof. We first define the problems below: 

Eeedback Vertex Set(A:) Parameter, k 

Input: An undirected graph G = (V, E) and an integer k 

Question: Does there exist a set X C V of size at most k such that G\X has no cycles? 


Odd Cycle TRANSVERSAL(fc) Parameter: k 

Input: An undirected graph G = (V, E) and an integer k 

Question: Does there exist a set X C V of size at most k such that G \ X has no odd cycles? 


Even Cycle Transversal(A:) Parameter: k 

Input: An undirected graph G = (V, E) and an integer k 

Question: Does there exist a set X C V of size at most k such that G \ X has no even cycles? 


21 













Triangle Deletion(A:) Parameter, k 

Input: An undirected graph G = {V, E) and an integer k 

Question: Does there exist a set A C i/ of size at most k such that G \ X has no triangles? 

Now we show how each of these problems can be viewed as a ^-Free Deletion problem for an 
appropriate choice of bad Q. 

• Feedback Vertex Set(/c): Take Q = {Ca, (74, Cs,...} and iF = G^ 

• Odd Cycle Transversal(/c): Take Q = {(73, (75, Gt, .. .} and H = G 3 

• Even Cycle Transversal(A:): Take G = {Ga, Gq, Gg, ■ ■ ■} and H = G 4 

• Triangle Deletion(/c): Take Q = {( 73 } and H = G 3 

We verify the conditions for Feedback Vertex Set(A:); the proofs for other problems are similar. Note that 
the choice of ^ = {(73, (74, (75,...} and H = Gg implies that Q is bad since each graph in Q is connected, 
the graph H belongs to Q and is a minimal element of Q (under operation of taking subgraphs). Finally, 
finding a set X such that the graph G \ A is ^-free implies that it has no cycles, i.e., A is a feedback vertex 
set for G. □ 

It is easy to see that the same proofs also work for the edge deletion versions of the Odd Cycle 
Transversal(/c), Even Cycle Transversal(A:) and the Triangle Deletion(/c) problems. 

C.2.3 ^-Editing 

Definition 29. A set of graphs Q is good if there is a minimal (under operation of taking subgraphs) connected 
graph H £ Q such that 2 P 2 f H, where P 2 is a path on 2 vertices. 

Eor any good set of graphs Q, we now show a lower bound for the following general problem: 


^-Editing(/c) Parameter: k 

Input: A graph class Q, an undirected graph G = {V,E) and an integer k 

Question: Does there exist a set A of A: edges such that (V, E U A) contains a graph from Q7 


Theorem 30. For a good set of graphs Q, any p-pass (randomized) streaming algorithm for the ^-Editing(A:) 
problem needs Q{n/p) space . 

Proof. Given an instance of DiSJOiNTNESS, we create a graph G which consists of n disjoint copies say 
Gi, G 2 ,..., Gn of H' := H\ 2 P 2 . By minimality of H, it follows that H' Q. Eet the two edges removed 
from H to get H' be ei and 62 . Eor each i £ [n] we add to G* the edge ei iff x* = 1 and the edge 62 iff 
Pi = 1. Eet the resulting graph be G. 

We now show that G contains a copy of H if and only if DiSJOiNTNESS answers YES. Suppose that G 
contains a copy of H. Note that since we add n disjoint copies of H' and add at most two edges (ei and 62 ) to 
each copy, it follows that each connected component of G is in fact a subgraph of FF = FF' U (ei + 62 ). Since 
H is connected and G contains a copy of FF, some connected component of G must exactly be the graph FF, 
i.e, to some copy Gj of H' we must have added both the edges ei and 62 . This implies x* = 1 = y*, and so 
Disjointness answers YES. Now suppose that Disjointness answers YES, i.e., there exists j £ [n] such 
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that Xj = 1 = Uj. Therefore, to the copy Gj of H' we would have added the edges ei and 62 which would 
complete it into H. So G contains a copy of H. 

Otherwise due to minimality of H, the graph G does not contain any graph from Q. Therefore, any 
p-pass (randomized) streaming algorithm that can determine whether a graph G contains a graph from Q (i.e., 
answers the question with A: = 0) gives a communication protocol for DiSJOiNTNESS, and hence requires 
Q.{n/p) space. □ 

This implies lower hounds for the following set of problems: 

Theorem 31. For each of the following problems, anyp-pass (randomized) algorithm requires Q{n/p) space: 
Triangle Packing(/c), s-Star Packing(/c) and Path(/c). 


Proof We first define the problems below: 


Triangle Packing(A) 

Input: An undirected graph G = (V, E) and an integer k 

Question: Do there exist at least k vertex disjoint triangles in G? 

Parameter: A 


s-Star Packing(A) 

Input: An undirected graph G = (V, E) and an integer k 

Parameter: A 

Question: Do there exist at least k vertex disjoint instances of Xi ^ 

in G (where s > 3)? 


Path (A) 

Input: An undirected graph G = (V, E) and an integer A 

Question: Does there exist a path in G of length > A? 

Parameter: A 


Now we show how each of these problems can be viewed as a ^-Editing problem for an appropriate choice 
of good Q. 


• Triangle Packing(/c) with A: = 1: Take Q = {6*3} and H = G^ 

• s-Star Packing(A) with A: = 1 : Take Q = and H = Ki^s 

• Path(A:) with k = 3: Take Q = {P3, P4, P5,...} and H = 

We verify the conditions for Triangle Packing(A) with A: = 1 ; the proofs for other problems are similar. 
Note that the choice of ^ = {6*3} and H = G^ implies that Q is good since Q only contains one graph. 
Finally, finding a set of edges X such that the graph {V,EVJ X) contains a graph from Q implies that it has 
at least one 6*3, i.e., X is a solution for Triangle Packing(A:) with A = 1 . □ 

C.2.4 Lower Bound for Cluster Vertex Deletion 

We now show a lower bound for the CLUSTER Vertex Deletion(A) problem. 

Definition 32 . We say that G is a cluster graph if each connected component ofG is a clique. 


Cluster Vertex Deletion(A) Parameter, k 

Input: An undirected graph G = (V, E) and an integer k 

Question: Does there exist a set X C 1/ of size at most k such that G \ X is a cluster graph? 
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Figure 1: Gadget for reduction from DiSJOiNTNESS to CLUSTER Vertex Deletion 


Theorem 33. For the Cluster Vertex Deletion(A;) problem, any p-pass (randomized) streaming 
algorithm needs Q.{n/p) space . 

Proof. Given an instance of DiSJOiNTNESS, we create a graph G on 3n vertices as follows. For each i G [n] 
we create three vertices ai,bi,Ci. Insert the edge (oj, Cj) iff Xi = I and the edge {bi,Ci) iff Vi = I This is 
illustrated in Figure [T] 

Now we will show that each connected component of G is a clique iff DiSJOiNTNESS answers NO. In 
the first direction suppose that each connected component of G is a clique. Then there cannot exist i G [n] 
such that Xi = 1 = Hi because then the vertices ai,bi, Ci will form a connected component which is a P 3 ; this 
contradicts the assumption that each connected component of G is a clique. In reverse direction, suppose that 
Disjointness answers NO. Then it is easy to see that each connected component of G is either Pi or P 2 , 
both of which are cliques. 

Therefore, any p-pass (randomized) streaming algorithm that can determine whether a graph is a cluster 
graph (i.e., answers the question with k = 0) gives a communication protocol for DiSJOiNTNESS, and hence 
requires Fl{n/p) space. □ 

C.2.5 Lower Bound for Minimum Fill-In 

We now show a lower bound for the Minimum FiLL-lN(fc) problem. 

Definition 34. We say that G is a chordal graph if it does not contain an induced cycle of length > 4. 


Minimum Fill-In(A:) Parameter, k 

Input: An undirected graph G = (V, E) and an integer k 

Question: Does there exist a set X of at most k edges such that (V, E U X) is a chordal graph? 


Theorem 35. For the Minimum Fill-In(A:) problem, any p-pass (randomized) streaming algorithm needs 
Q{n/p) space. 

Proof. We reduce from the DiSJOiNTNESS problem in communication complexity. Given an instance of 
Disjointness, we create a graph G on 4n vertices as follows. For each i G [n] we create vertices ai, bi, Ci, di 

^It is easy to see that the same proof also works for the problems of CLUSTER Edge DELETION(fc) where we can delete at most 
k edges and CLUSTER EDITING(fe) where we can delete/add at most k edges 
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Figure 2: Gadget for reduction from DiSJOiNTNESS to Minimum Fill-In 
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Figure 3: Gadget for reduction from DiSJOiNTNESS to COGRAPH Vertex Deletion 


and insert edges (a*, bi) and (q, di). Insert the edge (oi, c*) iff Xi = 1 and the edge {bi, Ci) iff yi = 1. This is 
illustrated in Figure]^ 

Now we will show that G is chordal iff DiSJOiNTNESS answers NO. In the first direction suppose that 
G is chordal. Then there cannot exist i G [n] such that Xi = 1 = yi because then the vertices ai,bi,Ci, di 
will form an induced C 4 ; contradiction to the fact that G is chordal. In reverse direction, suppose that 
Disjointness answers NO. Then it is easy to see that each connected component of G is either P 2 or P 3 . 
Hence, G cannot have an induced cycle of length > 4, i.e., G is chordal. 

Therefore, any p-pass (randomized) streaming algorithm that can determine whether a graph is a chordal 
graph (i.e., answers the question with A: = 0) gives a communication protocol for DiSJOiNTNESS, and hence 
requires Q{n/p) space. □ 

C.2.6 Lower Bound for Cograph Vertex Deletion 

We now show a lower bound for the COGRAPH Vertex Deletion(A:) problem. 

Definition 36. We say that G is a cograph if it does not contain an induced P 4 . 


Cograph Vertex Deletion(A;) Parameter, k 

Input: An undirected graph G = {V,E) and an integer k 

Question: Does there exist a set A C V of size at most k such that G \ A is a cograph? 
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Figure 4: Gadget for reduction from Disjointness to Bipartite Colorful Neighborhood 


Theorem 37. For the COGRAPH Vertex Deletion(A;) problem, any p-pass (randomized) streaming 
algorithm needs Q.{n/p) space . 

Proof. We reduce from the DiSJOINTNESS problem in communication complexity. Given an instance of 
Disjointness, we create a graph G on 4n vertices as follows. For each i G [n] we create vertices ai, bi, Ci, di 
and insert edges (oj, bi). Insert the edge (a*, Ci) iff Xi = 1 and the edge [bi, Ci) iff in = 1. This is illustrated 
in Figure]^ 

Now we will show that G has an induced P 4 if and only if DiSJOiNTNESS answers YES. In the first 
direction suppose that G has an induced P 4 . Since each connected component of G can have at most 4 
vertices, it follows that the P 4 is indeed given by the path Ci — ai — bi — di for some i G [n]. By construction 
of G, this implies that Xj = 1 = y*, i.e., DiSJOiNTNESS answers YES. In reverse direction, suppose that 
Disjointness answers YES. Then there exists j G [n] such that the edges {ai,Ci) and (bi, di) belong to G. 
Then G has the following induced P 4 given by Cj — aj — bj — dj. 

Therefore, any p-pass (randomized) streaming algorithm that can determine whether a graph is a cograph 
(i.e., answers the question with fc = 0) gives a communication protocol for DiSJOiNTNESS, and hence 
requires Q{n/p) space. □ 

C.2.7 Bipartite Colorful Neighborhood 

We now show a lower bound for the Bipartite Colorful Neighborhood(/c) problem. 


Bipartite Colorful Neighborhood(A:) Parameter, k 

Input: A bipartite graph G = (A, B, E) and an integer k 

Question: Is there a 2-coloring of B such that there exists a set S' C A of size at least k such that each 
element of S has at least one neighbor in B of either color? 


Theorem 38. For the Bipartite Colorful NElGHBORHOOD(fc) problem, any p-pass (randomized) 
streaming algorithm needs Q(n/p) space . 

Proof. We reduce from the DiSJOiNTNESS problem in communication complexity. Given an instance of 
Disjointness, we create a graph G on n -|- 2 vertices as follows. For each i G [ n ] we create a vertex Vi . In 
addition, we have two special vertices o and b. For each z G [n], insert the edge (a, Vi) iff Xi = 1 and the 
edge (b, Vi) iff y* = 1. Let A = {ui, V 2 ,..., Vn} and B = {a, b}. This is illustrated in Figure]^ 
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Now we will show that G answers YES for Bipartite Colorful Neighborhood(/c) with /c = 1 iff 
Disjointness answers YES. In the first direction suppose that G answers YES for Bipartite Colorful 
Neighborhood(A:) with k = 1. Let Vi be the element in A which has at least one neighbor in B of 
either color. Since \B\ = 2, this means that Vi is adjacent to both a and b, i.e., Xi = 1 = yi and hence 
Disjointness answers YES. In reverse direction, suppose that Disjointness answers YES. Hence, there 
exists j G [n] such that Xj = 1 = yj. This implies that Vj is adjacent to both a and b. Consider the 2-coloring 
of B by giving different colors to a and b. Then S = {vj} satisfies the condition of having a neighbor of 
each color in B, and hence G answers YES for Bipartite Colorful Neighborhood(A:) with k = 1. 

Therefore, any p-pass (randomized) streaming algorithm that can solve Bipartite Colorful Neigh- 
BORHOOd(A:) with k = 1 gives a communication protocol for DISJOINTNESS, and hence requires Q.{n/p) 
space. 

□ 
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